55 research outputs found

    Estimation du impact de l'utilisation d'I/O forwarding sur les performances des applications

    Get PDF
    In high performance computing architectures, the I/O forwarding technique is often used to alleviate contention in the access to the shared parallel file system servers. Intermediate I/O nodes are placed between compute nodes and these servers, and are responsible for forwarding requests. In this scenario, it is important to properly distribute the number of available I/O nodes among the running jobs to promote an efficient usage of these resources and improve I/O performance. However, the impact different numbers of I/O nodes have on an application bandwidth depends on its characteristics. In this report, we explore the idea of predicting application performance by extracting information from a coarse-grained aggregated trace from a previous execution, and then using this information to match each of the application's I/O phases to an equivalent benchmark, for which we could have performance results. We test this idea by applying it to five different applications over three case studies, and find a mean error of approximately 20%. We extensively discuss the obtained results and limitations to the approach, pointing at future work opportunities.Dans les plate-formes pour calcul hautes performances, la technique d’I/O forwarding est souvent utilisée pour atténuer les conflits d’accès aux serveurs du système de fichiers parallèle, qui sont partagés par les applications. Les nœuds d’I/O intermédiaires sont placés entre les nœuds de calcul et ces serveurs et sont responsables de la transmission des demandes. Dans ce scénario, il est important de répartir correctement le nombre de nœuds d’I/O disponibles parmi les jobs en cours d’exécution pour promouvoir une utilisation efficace de ces ressources et améliorer les performances d’I/O. Cependant, l’impact de différents nombres de nœuds intermédiaires sur la bande passante d’une application dépend de ses caractéristiques.Dans ce rapport, nous explorons l’idée de prédire les performances de l’application en extrayant des informations d’une trace agrégée à gros grain d’une exécution précédente, puis en utilisant ces informations pour faire correspondre chacune des phases d’I/O de l’application à un benchmark équivalent, pour lequel on pourrait avoir des résultats de performance. Nous testons cette idée en l’appliquant à cinq applications différentes sur trois études de cas, et trouvons une erreur moyenne d’environ 20%. Nous discutons longuement les résultats obtenus et les limites de l’approche, en indiquant des opportunités pour travaux futures

    Implémentation et test d’un ordonnanceur Weighted Fair Queuing pour des requêtes d’E/S

    Get PDF
    This report describes the work conducted by Alessa Mayer during a two-month internship in the Inria center of the University of Bordeaux, as a member of the Tadaam team. During her internship, her advisors were Luan Teylo and Francieli Boito. The goal of the internship was to implement and test the weighted fair queuing scheduling algorithm applied to I/O requests, and to integrate it into the AGIOS I/O scheduling library. That scheduler will be used to implement the I/O Sets method, proposed by members of the team in a recent paper.Ce rapport porte sur le travail mené par Alessa Mayer lors d’un stage de deux mois au centre Inria de l’université de Bordeaux, au sein de l’équipe Tadaam. Son stage a été encadré par Luan Teylo et Francieli Boito. L’objectif du stage était d’implémenter et de tester l’algorithme d’ordonnancement Weighted Fair Queuing (WFQ) appliqué aux requêtes d’E/S, et de l’intégrer dans la bibliothèque d’ordonnancement d’E/S AGIOS. Cet ordonnanceur sera utilisé pour implémenter la méthode I/O Sets, proposée par les membres de l’équipe dans un article récent

    The role of storage target allocation in applications' I/O performance with BeeGFS

    Get PDF
    International audienceParallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the I/O time of applications by separating files into fixed-size chunks and distributing them across multiple storage targets. Therefore, the I/O performance experienced with a PFS is directly linked to the capacity to retrieve these chunks in parallel. In this work, we conduct an in-depth evaluation of the impact of the stripe count (the number of targets used for striping) on the write performance of BeeGFS, one of the most popular parallel file systems today. We consider different network configurations and show the fundamental role played by this parameter, in addition to the number of compute nodes, processes and storage targets. Through a rigorous experimental evaluation, we directly contradict conclusions from related work. Notably, we show that sharing I/O targets does not lead to performance degradation and that applications should use as many storage targets as possible. Our recommendations have the potential to significantly improve the overall write performance of BeeGFS deployments and also provide valuable information for future work on storage target allocation and stripe count tuning

    IO-SETS: Simple and efficient approaches for I/O bandwidth management

    Get PDF
    International audienceOne of the main performance issues faced by high-performance computing platforms is the congestion caused by concurrent I/O from applications. When this happens, the platform’s overall performance and utilization are harmed. From the extensive work in this field, I/O scheduling is the essential solution to this problem. The main drawback of current techniques is the amount of information needed about applications, which compromises their applicability. In this paper, we propose a novel method for I/O management, IO-S ETS. We present its potential through a scheduling heuristic called SET-10, which requires minimum information and can be easily implemented

    I/O performance of multiscale finite element simulations on HPC environments

    Get PDF
    International audienceIn this paper, we present MSLIO, a code to mimic the I/O behavior of multiscale simulations. Such an I/O kernel is useful for HPC research, as it can be executed more easily and more efficiently than the full simulations when researchers are interested in the I/O load only. We validate MSLIO by comparing it to the I/O performance of an actual simulation, and we then use it to test some possible improvements to the output routine of the MHM (Multiscale Hybrid Mixed) library

    Arbitration policies for on-demand user-level I/O forwarding on HPC platforms

    Get PDF
    I/O forwarding is a well-established and widely-adopted technique in HPC to reduce contention in the access to storage servers and transparently improve I/O performance. Rather than having applications directly accessing the shared parallel file system, the forwarding technique defines a set of I/O nodes responsible for receiving application requests and forwarding them to the file system, thus reshaping the flow of requests. The typical approach is to statically assign I/O nodes to applications depending on the number of compute nodes they use, which is not always necessarily related to their I/O requirements. Thus, this approach leads to inefficient usage of these resources. This paper investigates arbitration policies based on the applications I/O demands, represented by their access patterns. We propose a policy based on the Multiple-Choice Knapsack problem that seeks to maximize global bandwidth by giving more I/O nodes to applications that will benefit the most. Furthermore, we propose a user-level I/O forwarding solution as an on-demand service capable of applying different allocation policies at runtime for machines where this layer is not present. We demonstrate our approach's applicability through extensive experimentation and show it can transparently improve global I/O bandwidth by up to 85% in a live setup compared to the default static policy.This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Supenor - Brasil (CAPES) - Finance Code 001. It has also received support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil. It is also partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grants PID2019-107255GB; and the Generalitat de Catalunya under contract 2014-SGR-1051. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Barcelona Supercomputing Center. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).Peer ReviewedPostprint (author's final draft

    Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms

    Get PDF
    International audienceI/O forwarding is a well-established and widelyadopted technique in HPC to reduce contention in the access to storage servers and transparently improve I/O performance. Rather than having applications directly accessing the shared parallel file system, the forwarding technique defines a set of I/O nodes responsible for receiving application requests and forwarding them to the file system, thus reshaping the flow of requests. The typical approach is to statically assign I/O nodes to applications depending on the number of compute nodes they use, which is not always necessarily related to their I/O requirements. Thus, this approach leads to inefficient usage of these resources. This paper investigates arbitration policies based on the applications I/O demands, represented by their access patterns. We propose a policy based on the Multiple-Choice Knapsack problem that seeks to maximize global bandwidth by giving more I/O nodes to applications that will benefit the most. Furthermore, we propose a userlevel I/O forwarding solution as an on-demand service capable of applying different allocation policies at runtime for machines where this layer is not present. We demonstrate our approach's applicability through extensive experimentation and show it can transparently improve global I/O bandwidth by up to 85% in a live setup compared to the default static policy
    • …
    corecore